Finding Similar RSS News Articles Using Correlation-Based Phrase Matching
نویسندگان
چکیده
Traditional phrase matching approaches, which can discover documents containing exactly the same phrases, fail to detect documents including phrases that are semantically relevant, but not exact matches. We propose a correlation-based phrase matching (CPM) model that can detect RSS news articles which contain not only phrases that are exactly the same but also semantically relevant, which dictate the degrees of similarity of any two articles. As the number of RSS news feeds continue to increase over the Internet, our CPM approach becomes more significant, since it minimizes the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. Experimental results show that our CPM model on matching bigrams and trigrams outperforms other phrase, including keyword, matching approaches.
منابع مشابه
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
As the number of RSS news feeds continue to increase over the Internet, it becomes necessary to minimize the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. In order to solve this problem, we present a novel approach, called InFRSS, which consists of a correlation-b...
متن کاملSynthesizing correlated RSS news articles based on a fuzzy equivalence relation
Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particula...
متن کاملGenerating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information
Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. In order to better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds in order to locate articles pertaining ...
متن کاملELIMINATING REDUNDANT AND LESS-INFORMATIVE RSS NEWS ARTICLES BASED ON WORD SIMILARITY AND A FUZZY EQUIVALENCE RELATION by
ELIMINATING REDUNDANT AND LESS-INFORMATIVE RSS NEWS ARTICLES BASED ON WORD SIMILARITY AND A FUZZY EQUIVALENCE RELATION Ian Garcia Department of Computer Science Master of Science The Internet has marked this era as the information age. There is no precedent in the amazing amount of information, especially network news, that can be accessed by Internet users these days. As a result, the problem ...
متن کاملCobra: Content-based Filtering and Aggregation of Blogs and RSS Feeds
Blogs and RSS feeds are becoming increasingly popular. The blogging site LiveJournal has over 11 million user accounts, and according to one report, over 1.6 million postings are made to blogs every day. The “Blogosphere” is a new hotbed of Internet-based media that represents a shift from mostly static content to dynamic, continuously-updated discussions. The problem is that finding and tracki...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007